Using Multiple Sources of Information for Constraint-Based Morphological Disambiguation
نویسنده
چکیده
USING MULTIPLE SOURCES OF INFORMATION FOR CONSTRAINT BASED MORPHOLOGICAL DISAMBIGUATION G okhan T ur M S in Computer Engineering and Information Science Supervisor Asst Prof Kemal O azer July This thesis presents a constraint based morphological disambiguation approach that is applicable to languages with complex morphology speci cally agglutina tive languages with productive in ectional and derivational morphological phe nomena For morphologically complex languages like Turkish automatic morpho logical disambiguation involves selecting for each token morphological parse s with the right set of in ectional and derivational markers Our system com bines corpus independent hand crafted constraint rules constraint rules that are learned via unsupervised learning from a training corpus and additional statisti cal information obtained from the corpus to be morphologically disambiguated The hand crafted rules are linguistically motivated and tuned to improve pre cision without sacri cing recall In certain respects our approach has been motivated by Brill s recent work but with the observation that his trans formational approach is not directly applicable to languages like Turkish Our approach also uses a novel approach to unknown word processing by employing a secondary morphological processor which recovers any relevant in ectional and derivational information from a lexical item whose root is unknown With this approach well below of the tokens remains as unknown in the texts we have experimented with Our results indicate that by combining these hand crafted statistical and learned information sources we can attain a recall of to with a corresponding precision of to and ambiguity of to parses per token
منابع مشابه
Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation
This paper presents a constraint-based morphological disambiguation approach that is applicable languages with complex morphology-specifically agglutinative languages with productive inflectional and derivational morphological phenomena. In certain respects, our approach has been motivated by Brill's recent work (Brill, 1995b), but with the observation that his transformational approach is not ...
متن کاملMorphological Disambiguation by Voting Constraints
We present a constraint-based morphological disambiguation system in which individual constraints vote on matching morphological parses, and disambiguation of all the tokens in a sentence is performed at the end by selecting parses that receive the highest votes. This constraint application paradigm makes the outcome of the disambiguation independent of the rule sequence, and hence relieves the...
متن کاملCombining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages
In this paper we present the results of the combination of stochastic and rule-based disambiguation methods applied to Basque languagel. The methods we have used in disambiguation are Constraint Grammar formalism and an HMM based tagger developed within the MULTEXT project. As Basque is an agglutinative language, a morphological analyser is needed to attach all possible readings to each word. T...
متن کاملA Bare-bones Constraint Grammar
This paper presents a solution for overcoming the lexical resource gap when mounting rule-based Constraint Grammar systems for minor languages, or in the face of licensing and financing limitations. We investigate how the performance of a CG disambiguation grammar responds to shifting input parameters, among them lexicon limitations of various degrees, the lack a morphological analyzer or both....
متن کاملWSD as a Distributed Constraint Optimization Problem
This work models Word Sense Disambiguation (WSD) problem as a Distributed Constraint Optimization Problem (DCOP). To model WSD as a DCOP, we view information from various knowledge sources as constraints. DCOP algorithms have the remarkable property to jointly maximize over a wide range of utility functions associated with these constraints. We show how utility functions can be designed for var...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره cmp-lg/9607030 شماره
صفحات -
تاریخ انتشار 1996